Clustering Relational Data Based on Randomized Propositionalization

نویسندگان

  • Grant Anderson
  • Bernhard Pfahringer
چکیده

Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multi-relational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generally is not straightforward to evaluate, but preliminary experimental results on a number of standard ILP datasets show promising results. Clusters generated without class information usually agree well with the true class labels of cluster members, i.e. class distributions inside clusters generally differ significantly from the global class distributions. The two-tiered algorithm described shows good scalability due to the randomized nature of the first step and the availability of efficient propositional clustering algorithms for the second step.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Propositionalization of Relational Learning: An Information Extraction Case Study

This paper develops a new propositionalization approach for relational learning which allows for efficient representation and learning of relational information using propositional means. We develop a relational representation language, along with a relation generation function that produces features in this language in a data driven way; together, these allow efficient representation of the re...

متن کامل

Approximate Relational Reasoning by Stochastic Propositionalization

For many real-world applications it is important to choose the right representation language. While the setting of First Order Logic (FOL) is the most suitable one to model the multi-relational data of real and complex domains, on the other hand it puts the question of the computational complexity of the knowledge induction process. A way of tackling the complexity of such real domains, in whic...

متن کامل

On propositionalization for knowledge discovery in relational databases

Propositionalization is a process that leads from relational data and background knowledge to a single-table representation thereof, which serves as the input to widespread systems for knowledge discovery in databases. Systems for propositionalization thus support the analyst during the usually costly phase of data preparation for data mining. Such systems have been applied for more than 15 yea...

متن کامل

Propositionalization Through Relational Association Rules Mining

In this paper we propose a novel (multi-)relational classification framework based on propositionalization. Propositionalization makes use of discovered relational association rules and permits to significantly reduce feature space through a feature reduction algorithm. The method is implemented in a Data Mining system tightly integrated with a relational database. It performs the classificatio...

متن کامل

Ensemble Relational Learning based on Selective Propositionalization

Dealing with structured data needs the use of expressive representation formalisms that, however, puts the problem to deal with the computational complexity of the machine learning process. Furthermore, real world domains require tools able to manage their typical uncertainty. Many statistical relational learning approaches try to deal with these problems by combining the construction of releva...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007